Designing efficient algorithms for querying large corpora
نویسندگان
چکیده
منابع مشابه
CorpusReader: designing and querying multi-layer corpora
CorpusReader is a framework for creating and querying multi-layer corpora, which contain several levels of analysis (morphology, syntax, semantics, etc.) and which are aimed at observing correlations between these levels. Building, representing and querying multi-layer corpora is complex. CorpusReader’s specificity essentially lies in merging the outputs of existing corpus analysis tools, avoid...
متن کاملEfficient Web Crawling for Large Text Corpora
Many researchers use texts from the web, an easy source of linguistic data in a great variety of languages. Building both large and good quality text corpora is the challenge we face nowadays. In this paper we describe how to deal with inefficient data downloading and how to focus crawling on text rich web domains. The idea has been successfully implemented in SpiderLing. We present efficiency ...
متن کاملDesigning Practical Efficient Algorithms for Symmetric Multiprocessors
Symmetric multiprocessors (SMPs) dominate the high-end server market and are currently the primary candidate for constructing large scale multiprocessor systems. Yet, the design of eecient parallel algorithms for this platform currently poses several challenges. In this paper, we present a computational model for designing eecient algorithms for symmetric multiprocessors. We then use this model...
متن کاملAlgorithms for Designing Large-Scale Multimedia Servers
In this paper, we propose a novel adaptive admission control algorithm, in which a client is admitted for service by a multimediaserver only if the extrapolation from the past measurements of the storage server performance characteristics indicate that the service requirements of all the clients can be met satisfactorily. Each client may request the retrieval of a variable bit rate (VBR) encode...
متن کاملQuerying Annotated Speech Corpora
This paper is concerned with querying annotated speech corpora. A growing number of such corpora is currently being created worldwide; however, their usefulness for a wider research community is restricted by the lack of standard tools for creating, editing, annotating, storing and querying them. Two solutions for these problems are presented here: the XML-based data format TASX for corpus crea...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Oslo Studies in Language
سال: 2021
ISSN: 1890-9639
DOI: 10.5617/osla.8504